Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(plugin-server): Add flat person override table and writer #19220

Merged
merged 21 commits into from
Dec 18, 2023

Conversation

tkaemming
Copy link
Contributor

@tkaemming tkaemming commented Dec 8, 2023

Problem

Following up on #19112, this adds a "flat" person overrides model/table that contains all attributes necessary to represent override details, intended to replace the existing overrides and mapping tables.

The new deferred overrides approach gives us stronger control over the order of operations in which overrides are processed, which in turn allows us to loosen the table constraints that are needed to ensure that the overrides data remains in a consistent state during updates. Since we no longer need to lean on the existing exclusion constraint to ensure integrity of this data during concurrent updates, we can move to a more straightforward schema design that does not require the use of the override mapping table to implement that constraint.

Whether the existing path (overrides & mappings; the default) or flat path is used can be controlled with the POE_DEFERRED_WRITES_USE_FLAT_OVERRIDES configuration setting to the overrides writer. This setting only is used if POE_DEFERRED_WRITES_ENABLED is also enabled.

The related code for the squash workflow is at #19347.

Migration

This isn't exactly a one-way door, but ideally we'd only change enable this setting once we're ready to commit to this path forward. My general thinking is that we'd:

  1. Have already moved overrides to the deferred path (with POE_DEFERRED_WRITES_ENABLED, from feat(plugin-server): Add capability to use deferred overrides writer and worker #19112)
  2. Stop the overrides writer deployment, new overrides will buffer to the pending table
  3. Move the existing overrides data to the new schema with a big INSERT INTO … SELECT from the old tables to the new one — this could be reversed if necessary, but ideally we would want to avoid the complexity of doing so
  4. Start the overrides writer deployment, new overrides that were buffered will now be flushed to the new flat table
  5. Remove the now unused (non-flat) overrides code, removing interim abstractions added here as appropriate
  6. Drop the now unused (non-flat) overrides and mapping table at our leisure

Table Rewrite Query

INSERT INTO posthog_flatpersonoverride (
    team_id,
    old_person_id,
    override_person_id,
    oldest_event,
    version
)
SELECT
    override.team_id as team_id,
    old_person.uuid as old_person_id,
    override_person.uuid as override_person_id,
    oldest_event,
    version
FROM posthog_personoverride override
LEFT OUTER JOIN posthog_personoverridemapping old_person
    ON override.team_id = old_person.team_id AND override.old_person_id = old_person.id
LEFT OUTER JOIN posthog_personoverridemapping override_person
    ON override.team_id = override_person.team_id AND override.override_person_id = override_person.id

How did you test this code?

  • Added tests to test existing mapping and new flat overrides writers in isolation to ensure consistency between the implementations.
  • Added another person overrides tests mode to test deferred overrides using the flat overrides writer in the plugin server.

@tkaemming tkaemming force-pushed the poe-deferred-override-worker-flat branch 2 times, most recently from 579a9ca to 18d5f5d Compare December 15, 2023 02:47
Copy link
Contributor Author

@tkaemming tkaemming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is arguably a bit overdone, but I wanted to use the existing tests as much as possible to ensure that all of the various implementations were consistent with each other while making these changes as possible.

Comment on lines +710 to +732
public async getPersonOverrides(teamId: number): Promise<PersonOverrideDetails[]> {
const { rows } = await this.postgres.query(
PostgresUse.COMMON_WRITE,
SQL`
SELECT
override.team_id,
old_person.uuid as old_person_id,
override_person.uuid as override_person_id,
oldest_event
FROM posthog_personoverride override
LEFT OUTER JOIN posthog_personoverridemapping old_person
ON override.team_id = old_person.team_id AND override.old_person_id = old_person.id
LEFT OUTER JOIN posthog_personoverridemapping override_person
ON override.team_id = override_person.team_id AND override.override_person_id = override_person.id
WHERE override.team_id = ${teamId}
`,
undefined,
'getPersonOverrides'
)
return rows.map((row) => ({
...row,
oldest_event: DateTime.fromISO(row.oldest_event),
}))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was pulled in from tests so that I could swap the entire backing implementation at once without having to coordinate it across multiple abstractions. This method is really only intended to be used in tests, though.

(I did rewrite the query here because I found the other one challenging to read, though looking back at it now, I do think that the CTE having the predicate associated with it versus being specified here 3x is a nice idea.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. I have a minor preference for naming things like fooForTests or something to avoid footguns. It'd be pretty weird for someone to reach for this... but...

Comment on lines +791 to +810
{
value: JSON.stringify({
team_id: overrideDetails.team_id,
old_person_id: overrideDetails.old_person_id,
override_person_id: overrideDetails.override_person_id,
oldest_event: castTimestampOrNow(overrideDetails.oldest_event, TimestampFormat.ClickHouse),
merged_at: castTimestampOrNow(mergedAt, TimestampFormat.ClickHouse),
version: 0,
}),
},
...transitiveUpdates.map(({ old_person_id, version, oldest_event }) => ({
value: JSON.stringify({
team_id: overrideDetails.team_id,
old_person_id: old_person_id,
override_person_id: overrideDetails.override_person_id,
oldest_event: castTimestampOrNow(oldest_event, TimestampFormat.ClickHouse),
merged_at: castTimestampOrNow(mergedAt, TimestampFormat.ClickHouse),
version: version,
}),
})),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be made to be less duplicative with the non-flat version, but I don't think it's worth the investment given that the non-flat implementations days should be numbered at this point.

)
return rows.map((row) => ({
...row,
team_id: parseInt(row.team_id), // XXX: pg returns bigint as str (reasonably so)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good idea in general terms, but like the other getPersonOverrides above, this is only intended to be used in tests.

On second thought, there's probably no harm in replacing parseInt with BigInt, though? It seems like it should be safe…?

> BigInt('10') == 10
true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems harmless (either way?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, Jest doesn't like it after all,

diff --git a/plugin-server/src/worker/ingestion/person-state.ts b/plugin-server/src/worker/ingestion/person-state.ts
index f49ee1ad33..e33fcad987 100644
--- a/plugin-server/src/worker/ingestion/person-state.ts
+++ b/plugin-server/src/worker/ingestion/person-state.ts
@@ -832,7 +832,7 @@ export class FlatPersonOverrideWriter {
         )
         return rows.map((row) => ({
             ...row,
-            team_id: parseInt(row.team_id), // XXX: pg returns bigint as str (reasonably so)
+            team_id: BigInt(row.team_id), // pg returns bigint as str
             oldest_event: DateTime.fromISO(row.oldest_event),
         }))
     }

causes this error:

  ● person overrides writer: flat › handles direct overrides

    expect(received).toEqual(expected) // deep equality

    - Expected  - 1
    + Received  + 1

      Array [
        Object {
          "old_person_id": "018c6f1a-36df-0000-4ee6-e4ef7823a3f8",
          "oldest_event": "1969-12-31T16:00:00.000-08:00",
          "override_person_id": "018c6f1a-36df-0001-a5b9-21531ca2319e",
    -     "team_id": 474353818,
    +     "team_id": 474353818n,
        },
      ]

      2152 |         })
      2153 |
    > 2154 |         expect(await writer.getPersonOverrides(teamId)).toEqual([{ ...defaults, ...override }])
           |                                                         ^
      2155 |     })
      2156 |
      2157 |     it('handles transitive overrides', async () => {

      at Object.toEqual (tests/worker/ingestion/person-state.test.ts:2154:57)

Not sure why, as the values compare just fine:

> 486884794 == 486884794n
true

Looks like that equality test comes from somewhere in here, so it's not using the equality operator directly: https://github.com/jestjs/jest/blob/e54c0ebb048e10331345dbe99f8ec07654a43f1c/packages/expect-utils/src/jasmineUtils.ts#L63-L212

teamId is used all over in these tests so it seems like a whole different can of worms to try and change it to BigInt consistently, so going to avoid falling down that rabbit hole and leave it as-is, I guess. (posthog_team.id isn't a bigint on the table schema, either, FWIW -- at least as it exists in the repository, I didn't check the production schema.)

Copy link
Contributor

@bretthoerner bretthoerner Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

posthog_team.id isn't a bigint on the table schema, either, FWIW

Oh, weird. To be honesty when I read your comment I was 🤨 that we're planning for >2billion teams already. Not that it isn't nice to be prepared, but this seems pretty low priority to think about, lol.

edit: Actually the Django Person model doesn't even claim to use BigInt if I'm reading correctly. I get the feeling this was a copy-pasta in the old Overrides where it makes more sense for Persons themselves to be BigInt.

Copy link
Contributor Author

@tkaemming tkaemming Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just grabbed it from PersonOverrideMapping myself. I just assumed there was a "always use bigserial for primary keys" to avoid surprises later regardless of how probable they'd be to become real issues.

Knowing that, maybe it makes sense to drop these back down to a plain old 4 byte integer for consistency with the actual team primary key? Though these tables should ideally stay pretty small through overrides, so maybe it doesn't matter much at all? 🤷‍♂️

I do like the idea of 2 billion teams though.

Comment on lines -137 to -144
class PendingPersonOverride(models.Model):
id = models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name="ID")
team_id = models.BigIntegerField()
old_person_id = models.UUIDField()
override_person_id = models.UUIDField()
oldest_event = models.DateTimeField()


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very silly, but I just moved this from between PersonOverrideMapping and PersonOverride as those tables will no longer be needed after we move to the new flat table. In retrospect, this should have been before both or after both, it doesn't make sense for me to have put it in between them.

posthog/models/person/person.py Show resolved Hide resolved
@tkaemming tkaemming marked this pull request as ready for review December 15, 2023 04:58
@tkaemming tkaemming requested a review from a team December 15, 2023 04:58
posthog/models/person/person.py Show resolved Hide resolved
posthog/models/person/person.py Show resolved Hide resolved
Comment on lines +710 to +732
public async getPersonOverrides(teamId: number): Promise<PersonOverrideDetails[]> {
const { rows } = await this.postgres.query(
PostgresUse.COMMON_WRITE,
SQL`
SELECT
override.team_id,
old_person.uuid as old_person_id,
override_person.uuid as override_person_id,
oldest_event
FROM posthog_personoverride override
LEFT OUTER JOIN posthog_personoverridemapping old_person
ON override.team_id = old_person.team_id AND override.old_person_id = old_person.id
LEFT OUTER JOIN posthog_personoverridemapping override_person
ON override.team_id = override_person.team_id AND override.override_person_id = override_person.id
WHERE override.team_id = ${teamId}
`,
undefined,
'getPersonOverrides'
)
return rows.map((row) => ({
...row,
oldest_event: DateTime.fromISO(row.oldest_event),
}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. I have a minor preference for naming things like fooForTests or something to avoid footguns. It'd be pretty weird for someone to reach for this... but...

)
return rows.map((row) => ({
...row,
team_id: parseInt(row.team_id), // XXX: pg returns bigint as str (reasonably so)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems harmless (either way?).

posthog/models/person/person.py Show resolved Hide resolved
Comment on lines +235 to +239
* A person should only appear as an "old" person at most once for a given
team (as appearing more than once would imply they were merged into
multiple people.)
* A person cannot be merged into themselves (i.e. be both the "old" and
"override" person within a given row.)
Copy link
Contributor Author

@tkaemming tkaemming Dec 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we don't expect to be able to enforce the third item in this list (below this comment) any longer, but I do wonder if it would make sense to try to continue to enforce these (second point is a trivial check constraint, first point would be a probably-not-too-expensive unique constraint over team_id and old_person_id.)

Even if there isn't a playbook about what to do in the case that the constraint validation fails (other than just generally panic), it's probably better to know something has gone wrong (and stop overrides processing, even if that means we'll be collecting more potentially erroneous pending overrides) versus not knowing at all and just letting it happen?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these both seem cheap and worthwhile to me. 🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 'em here: 7b8176b

all merges that have occurred, but have not yet been integrated into the
ClickHouse events table through a squash operation. Once the effects of a
merge have been integrated into the events table, the associated override
record can be deleted from this table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If keeping these around is cheap it seems worthwhile to me. There's something nice about having a source of truth in a DB like PG.

But if I'm forgetting something about how squash works then ignore me, it's more of an idle thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are currently deleted after a squash happens, #19347 probably helps shine a lot more light on how that process works -- probably worth picking up this conversation over there? I could see the benefits of not deleting from this table, but would need to give it a closer look to check if that would be safe to do so.

Copy link
Contributor

@bretthoerner bretthoerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice comments, thanks. I know I'll be glad they are in there in checks watch 3 weeks when I forget how any of this works.

@tkaemming tkaemming merged commit a4a694c into master Dec 18, 2023
72 checks passed
@tkaemming tkaemming deleted the poe-deferred-override-worker-flat branch December 18, 2023 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants